Goto

Collaborating Authors

 Ocala


M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

arXiv.org Artificial Intelligence

Document visual question answering (DocVQA) pipelines that answer questions from documents have broad applications. Existing methods focus on handling single-page documents with multi-modal language models (MLMs), or rely on text-based retrieval-augmented generation (RAG) that uses text extraction tools such as optical character recognition (OCR). However, there are difficulties in applying these methods in real-world scenarios: (a) questions often require information across different pages or documents, where MLMs cannot handle many long documents; (b) documents often have important information in visual elements such as figures, but text extraction tools ignore them. We introduce M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts (closed-domain and open-domain), question hops (single-hop and multi-hop), and evidence modalities (text, chart, figure, etc.). M3DocRAG finds relevant documents and answers questions using a multi-modal retriever and an MLM, so that it can efficiently handle single or many documents while preserving visual information. Since previous DocVQA datasets ask questions in the context of a specific document, we also present M3DocVQA, a new benchmark for evaluating open-domain DocVQA over 3,000+ PDF documents with 40,000+ pages. In three benchmarks (M3DocVQA/MMLongBench-Doc/MP-DocVQA), empirical results show that M3DocRAG with ColPali and Qwen2-VL 7B achieves superior performance than many strong baselines, including state-of-the-art performance in MP-DocVQA. We provide comprehensive analyses of different indexing, MLMs, and retrieval models. Lastly, we qualitatively show that M3DocRAG can successfully handle various scenarios, such as when relevant information exists across multiple pages and when answer evidence only exists in images.


MiMiC: Minimally Modified Counterfactuals in the Representation Space

arXiv.org Artificial Intelligence

Language models often exhibit undesirable behaviors, such as gender bias or toxic language. Interventions in the representation space were shown effective in mitigating such issues by altering the LM behavior. We first show that two prominent intervention techniques, Linear Erasure and Steering Vectors, do not enable a high degree of control and are limited in expressivity. We then propose a novel intervention methodology for generating expressive counterfactuals in the representation space, aiming to make representations of a source class (e.g., "toxic") resemble those of a target class (e.g., "non-toxic"). This approach, generalizing previous linear intervention techniques, utilizes a closed-form solution for the Earth Mover's problem under Gaussian assumptions and provides theoretical guarantees on the representation space's geometric organization. We further build on this technique and derive a nonlinear intervention that enables controlled generation. We demonstrate the effectiveness of the proposed approaches in mitigating bias in multiclass classification and in reducing the generation of toxic language, outperforming strong baselines.


Does Outrage Signal Cyber Attacks? Predicting "Bad Behavior" from Sentiment in Online Content

AAAI Conferences

We demonstrate that it is possible to leverage big data in the form of tweets and linked webpages to find expressions of sentiment that signal "bad behavior" such as cyber attacks. We hypothesize that expressions of "outrage" (high intensity, negative affect sentiment) against an organization in public data may be predictive of cyber attacks for two reasons: 1) threat actors may be motivated to launch an attack based on anger/discontent, and 2) outrage associated with an organization or industry may increase the likelihood of that organization or industry being victimized by threat actors (i.e., as a form of "vigilante justice"). We measure sentiment in online content and determine trends in public emotion and their correlation to trends in cyber attacks, as reported in Hackmageddon. We demonstrate that dimensions of sentiment, as afforded by our use of the Circumplex model of emotion, do yield correlations to reported cyber attacks, but differ dependent upon the domain of the data. Thus the use of this technique requires careful analysis for optimal application.


Cyberdyne's HAL Exoskeleton Helps Patients Walk Again in First Treatments at U.S. Facility

IEEE Spectrum Robotics

Danny Bal was riding his brand new motorcycle to work from his home in Ocala, Florida two years ago when the driver of an oncoming car fell asleep and ploughed into Bal's electric-blue bike. After the accident, which crushed three of Bal's thoracic vertebrae and shredded a spinal nerve, Bal adjusted to life in a wheelchair. He added a motorized lift to his beloved F-250 truck, explored local trails with a hand-powered bike, and joined a therapeutic horseback riding program. Now, one of Bal's daughters is about to get married, and 57-year-old Bal wants to walk in her ceremony. So on a recent Friday morning in December at Brooks Rehabilitation in Jacksonville, Florida, Bal was back on his feet, taking slow but steady steps as his granddaughter cheered from the sidelines.


An Ostrich-Like Robot Pushes the Limits of Legged Locomotion

MIT Technology Review

What looks like a tiny mechanical ostrich chasing after a car is actually a significant leap forward for robot-kind. The clever and simple two-legged robot, known as the Planar Elliptical Runner, was developed at the Institute for Human and Machine Cognition in Ocala, Florida, to explore how mechanical design can be used to enable sophisticated legged locomotion. A video produced by the researchers shows the robot being tested in a number of situations, including on a treadmill and running behind and alongside a car with a helping hand from an engineer. In contrast to many other legged robots, this one doesn't use sensors and a computer to help balance itself. Instead, its mechanical design provides dynamic stability as it runs.


Government regulators are looking into fatal Tesla crash involving Autopilot

#artificialintelligence

Tesla announced today that the National Highway Traffic Safety Administration has opened an investigation into a recent fatal crash of a Model S with the company's Autopilot feature activated. The accident took place on May 7th in a small West Florida town called Williston. The Florida Highway Patrol is also conducting its own investigation of the accident, according to a public affairs officer there. The same officer reported that Tesla has, since the fatal accident in May, sent engineers down to Ocala, Florida to assist investigators in accessing data they needed to evaluate the causes of the crash. Tesla offered an account of the event in a blog post titled "A Tragic Loss" that went up today, detailing the crash, an "extremely rare circumstance," which occurred on a divided highway.


Speech Adaptation in Extended Ambient Intelligence Environments

AAAI Conferences

This Blue Sky presentation focuses on a major shift toward a notion of โ€œambient intelligenceโ€ that transcends general applications targeted at the general population.ย  The focus is on highly personalized agents that accommodate individual differences and changes over time.ย  This notion of Extended Ambient Intelligence (EAI) concerns adaptation to a personโ€™s preferences and experiences, as well as changing capabilities, most notably in an environment where conversational engagement is central.ย  An important step in moving this research forward is the accommodation of different degrees of cognitive capability (including speech processing) that may vary over time for a given userโ€”whether through improvement or through deterioration. We suggest that the application of divergence detection to speech patterns may enable adaptation to a speakerโ€™s increasing or decreasing level of speech impairment over time. Taking an adaptive approach toward technology development in this arena may be a first step toward empowering those with special needs so that they may live with a high quality of life.ย  It also represents an important step toward a notion of ambient intelligence that is personalized beyond what can be achieved by mass-produced, one-size-fits-all software currently in use on mobile devices.


Deterioration of Speech as an Indicator of Physiological Degeneration (DESIPHER)

AAAI Conferences

Our speech research focuses on the detection of dialectal Most physiological assessments commonly used to determine variations by identifying speech language divergences the functional status of patients with Amyotrophic along a range of different dimensions. We borrow the notion lateral sclerosis (ALS) require trained clinical personnel to of divergence from the study of cross-linguistic variations administer and interpret the results. Speech impairments (Dorr, 1993) and apply it towards developing an assessment eventually affect 80-95% of patients with ALS (Beukelman, of bulbar function in patients with ALS, to improve 2011). Initial impairments include reduced speaking upon existing assessments (Green et al., 2013).


Companion-Based Ambient Robust Intelligence (CARING)

AAAI Conferences

We present a Companion-based Ambient Robust INtelliGence (CARING) system, for communication with, and support of, clients with Traumatic brain injury (TBI) or Amyotrophic Lateral Sclerosis (ALS). A central component of this system is an artificial companion, combined with a range of elements for ambient intelligence. The companion acts as a personalized intermediary for multi-party communication between the client, the environment (e.g. a Smart Home), caregivers and health professionals. CARING is based on tightly coupled systems drawing from natural language processing, speech recognition and adaptation, deep language understanding and constraint-based knowledge representation and reasoning. A major innovation of the system is its ability to adapt and accommodate different interfaces associated with different client capabilities and needs. The system will use, as a proxy, different interaction requirements of clients (e.g., Brain-Computer Interfaces) at different stages of ALS progression and with different types of TBI impairments. Ultimately, this technology is expected to improve the quality of life for clients through conversation with a computer.


Shared Awareness, Autonomy and Trust in Human-Robot Teamwork

AAAI Conferences

Teamwork requires mutual trust among team members. Establishing and maintaining trust depends upon alignment of mental models, an aspect of shared awareness. We present a theory of how maintenance of model alignment is integral to fluid changes in relative control authority (i.e., adaptive autonomy) in human-robot teamwork.